Skip to content

C++: Map more expressions to OperandNodes#11781

Merged
MathiasVP merged 2 commits intogithub:mathiasvp/replace-ast-with-ir-use-usedataflowfrom
MathiasVP:as-expr-for-arrays
Jan 9, 2023
Merged

C++: Map more expressions to OperandNodes#11781
MathiasVP merged 2 commits intogithub:mathiasvp/replace-ast-with-ir-use-usedataflowfrom
MathiasVP:as-expr-for-arrays

Conversation

@MathiasVP
Copy link
Copy Markdown
Contributor

@MathiasVP MathiasVP commented Dec 22, 2022

This fixes a performance problem I observed on cmake. The problem happened in the getBufferSize's predicate, where we have a case that searches backwards through expressions using DataFlow::localExprFlowStep. The code that exhibited the performance problem looked something like:

void foo() {
  int a[65536];

  a[0] = ...;
  a[1] = ...;
  ...
  a[65535] = ...;

  for(int i = 0; i < ...; ++i) {
    use(a[i]);
  }
}

and, while DataFlow::localFlowStep would handle this code perfectly well, the DataFlow::localExprFlowStep predicate (which has an additional transitive closure internally to step to the nearest DataFlow::Node that has an asExpr() result) would explode because the node that received flow from one assignment (i.e., a[k] = ...;) to the next (i.e., a[k+1] = ...;) wouldn't be the node that had a result for asExpr().

This PR changes asExpr so that Expr flow will proceed as a[0] = ...; -> a[1] = ...; -> ... -> a[65536] = ...; -> use(a[i]).

@MathiasVP MathiasVP added the C++ label Dec 22, 2022
@MathiasVP MathiasVP requested a review from a team as a code owner December 22, 2022 11:14
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Dec 22, 2022
@jketema
Copy link
Copy Markdown
Contributor

jketema commented Dec 22, 2022

I see result changes on DCA, is that expected?

@MathiasVP
Copy link
Copy Markdown
Contributor Author

MathiasVP commented Dec 22, 2022

I see result changes on DCA, is that expected?

It's not. I would guess that these are due to changes to locations (or duplicated paths), but I haven't verified this yet.

or
node.(IndirectOperand).isIRRepresentationOf(_, _)
exists(Instruction def |
unique( | | getAUse(def)) = node.getOperand() and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for the unique?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want the Expr -> Node direction to be a one-to-many relation. If there happens to be more than one use of the operand, then we'll fall back to picking the InstructionNode with the defining Instruction as the node for the expression.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was implicitly achieved before by the much more restrictive definition of this predicate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly. Previously we'd only map an expression to an operand when it was an argument of (or a result of) a function call. And in both cases (at least the argument case) there should always be a unique use.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. When there's a unique use this predicate kicks in and ensures that that the Call expression is mapped to the dataflow node (and the Call expression has a pretty toString like call to getenv). And when there isn't a unique use we get the Instruction's Opcode's toString which is a lot less user-friendly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was on my list to fix anyway, so no need to solve it here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ I think this is one of the issues we should tackle as part of the "prettify dataflow paths" issue 😄.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was on my list to fix anyway, so no need to solve it here.

Ah, perfect!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, the prettifying issue that was what I was referring to 😄

Copy link
Copy Markdown
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me in principle, but we need to have a closer look at the DCA alert changes.

@MathiasVP
Copy link
Copy Markdown
Contributor Author

I see result changes on DCA, is that expected?

It's not. I would guess that these are due to changes to locations (or duplicated paths), but I haven't verified this yet.

Hmm... I looked at most of the new SAMATE results, and they are all duplicated paths. I still need to look at the real world results, though.

@MathiasVP
Copy link
Copy Markdown
Contributor Author

MathiasVP commented Jan 9, 2023

Here's a rundown of the result changes:

cpp/potential-system-data-exposure and cpp/cleartext-storage-file

  • All new results are duplicates (😭)

cpp/uncontrolled-allocation-size

  • The lost results are because we now correctly hit the sanitizer in the query (which blocks the flow) 🎉

cpp/unbounded-write

  • The lost results are because we now correctly hit the (kinda bad) sanitizers in DefaultTaintTracking (which blocks the flow) 🎉

@MathiasVP MathiasVP merged commit 0f93e5c into github:mathiasvp/replace-ast-with-ir-use-usedataflow Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants